Capstone Project - The Battle of Neighborhoods

A description of the problem and a discussion of the background

A contractor is trying to start his own business in the Chennai city. His problem is to find a place suitable with more number of people accesing the location.

Chennai, the capital of Tamil Nadu is the fourth largest Metropolitan City in India.The city of Chennai is classified into three regions: North Chennai, Central Chennai and South Chennai.The city is divided on the basis of composition into four major parts: North, Central, South and West. North Chennai is primarily an industrial area while some areas are residential. Central Chennai is the commercial heart of the city and the downtown area. South Chennai and West Chennai, previously predominantly residential areas are fast turning into commercial areas, hosting a large number of IT and financial companies alone the GST Road, OMR and NH 48.North madras end on the area Red Hills.

A description of the data and how it will be used to solve the problem

  1. Scrap data from Wikipedia page "https://en.wikipedia.org/wiki/List_of_neighbourhoods_of_Chennai" - This page consists of List of neighbourhoods of chennai.

  2. geocoder.arcgis - This library used to get geological coordinates of the neighbourhoods.

  3. Foursquare API - Used to get venues around each neighbourhoods of chennai.

Methodology

In this notebook, the addresses have been converted into their equivalent latitude and longitude values.Then the Foursquare API is used to explore neighborhoods in Chennai City. Explore function has been used to get the most common venue categories in each neighborhood, and to group the neighborhoods into clusters. Then k-means clustering algorithm has been used to complete this task. Finally,I used the Folium library to visualize the neighborhoods in Chennai City and their emerging clusters.

k-means Clustering. There are many models for clustering out there. k-means Clustering is the simplest model among them. Despite its simplicity, k-means is vastly used for clustering in many data science applications, especially useful if you need to quickly discover insights from unlabeled data.

Some real-world applications of k-means include:

  1. customer segmentation,
  2. understand what the visitors of a website are trying to accomplish,
  3. pattern recognition, and,
  4. data compression.

Results

Maps showing neighbourhoods of Chennai

Chennai_Neighbourhoods.png

Map Showing Clusters

Chennai_Clusters.png

Discussions

  1. Cluster 0 comprises of Music venue, women's store and gaming cafe.
  2. Cluster 1 comprises of mostly ATM, Restaurant,pharmacy, train station, bus station etc.
  3. Cluster 2 comprises of ATM, Garden & Furniture store.
  4. Cluster 3 comprises of Train station, platform, bus station, pharmacy and fast food restaurant.
  5. Cluster 4 comprises of Indian restaurant,Departmental store,Train station, metro station , multiplexes.

Results

The Contractor can open his business in Cluster 1, Cluster 3 or Cluster 4. These are the places which are attracting most of the population

In [ ]: